Skip to content

Initial commit: Rusty AI SDK - unified multi-provider AI framework#1

Merged
undivisible merged 16 commits into
mfrom
claude/rusty-ai-sdk-design-7bT48
Apr 23, 2026
Merged

Initial commit: Rusty AI SDK - unified multi-provider AI framework#1
undivisible merged 16 commits into
mfrom
claude/rusty-ai-sdk-design-7bT48

Conversation

@undivisible

Copy link
Copy Markdown
Owner

Summary

This is the initial commit of the Rusty AI SDK, a comprehensive Rust framework for building AI applications with support for multiple providers (OpenAI, Anthropic Claude, Google Gemini, Ollama, and local runtimes) and unified abstractions for language models, embeddings, streaming, and tool calling.

Key Changes

Core Framework (rusty_ai)

  • Trait-based abstractions: LanguageModel, EmbeddingModel, Provider traits for pluggable implementations
  • Unified types: Prompt, Message, ContentPart, StreamEvent, GenerateResult for consistent API across providers
  • Streaming support: AiStream with StreamEvent enum for real-time response handling
  • Tool calling: ToolDefinition, ToolCallRequest, ToolChoice for structured function invocation
  • Extended thinking: ThinkingConfig enum supporting Anthropic adaptive thinking, Gemini budget-based, and Ollama reasoning modes
  • Capabilities system: Capability and CapabilitySet for runtime feature detection
  • Router: Dynamic request routing based on prompt/options conditions

Provider Implementations

  • OpenAI-compatible adapter (rusty_openai_compatible): Generic HTTP API adapter with SSE streaming and tool-call accumulation
  • ChatGPT wrapper (rusty_chatgpt): Pre-configured OpenAI client with well-known model constants
  • Claude (rusty_claude): Anthropic Messages API with streaming, system message separation, and thinking support
  • Gemini (rusty_gemini): Google Gemini with SSE streaming and structured output
  • Ollama (rusty_ollama): Local Ollama server integration with chat and embedding support
  • Local runtimes: Bridges for Gemini Nano (Android), Apple Foundation Models, Windows Phi Silica, and browser AI APIs

Utilities

  • Middleware system (rusty_middleware): Composable request/response interceptors (logging, caching, retry)
  • UI stream protocol (rusty_ui_stream): Frontend-friendly event format with SSE and NDJSON encoders
  • Testing utilities (rusty_testing): Mock models and providers with call recording for unit tests

Examples

  • Basic text generation, streaming, multimodal input, tool loops
  • Object/structured output generation and streaming
  • Router-based model selection
  • Local runtime integration patterns (Android, Apple, Windows)

Notable Implementation Details

  • Async-first design: All I/O operations use async_trait and tokio
  • Stream composition: Uses futures::stream for efficient event processing and transformation
  • Error handling: Unified AiError type with provider-specific context
  • Type safety: Serde-based serialization with careful null-handling for optional fields
  • Extensibility: Bridge pattern for platform-specific integrations (JNI, Swift, Win32)
  • Workspace structure: Modular crate organization with shared dependencies via workspace manifest

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq

claude and others added 16 commits March 30, 2026 10:06
Complete Cargo workspace with unified AI SDK architecture:

Core crates:
- rusty_ai: traits (LanguageModel, EmbeddingModel, Provider, Tool, Middleware),
  typed errors, streaming (futures::Stream), structured output, routing
- rusty_middleware: retry with backoff, logging, caching, middleware chain
- rusty_ui_stream: SSE + NDJSON encoders, versioned UI protocol
- rusty_testing: mock models/providers, stream assertions

Cloud providers:
- rusty_openai_compatible: generic OpenAI-compatible API adapter
- rusty_chatgpt: OpenAI ChatGPT (GPT-4o, o3-mini)
- rusty_claude: Anthropic Messages API (Sonnet, Opus, Haiku)
- rusty_gemini: Google Gemini API with multimodal support
- rusty_ollama: local Ollama server with NDJSON streaming

Local/platform runtimes (bridge-based, first-class):
- rusty_gemini_nano: Android Prompt API with session support
- rusty_foundationmodels: Apple Foundation Models
- rusty_phi_silica: Windows NPU Phi Silica
- rusty_browser: Chrome/Edge built-in AI for WASM targets

All crates compile cleanly against workspace dependencies.

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
…d output fixes

Core (rusty_ai):
- Add ThinkingConfig enum (Adaptive, Budget, Enabled) and ReasoningEffort to GenerateOptions
- Add ThinkingDelta and SyntheticStreamingNotice stream events
- Add ExtendedThinking, VideoInput, AudioInput, AudioOutput Capability variants
- SyntheticStreamer now emits SyntheticStreamingNotice before text chunks

rusty_claude:
- Fix ImageSource to support both base64 and URL sources (no more [image: url] fallback)
- Add structured output via output_config.format (json_schema, GA 2026 API)
- Add extended thinking via thinking field (adaptive mode)
- Add ThinkingDelta/SignatureDelta stream parser handling
- Update model IDs: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001

rusty_gemini:
- Update models to gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
- Add ThinkingConfig (thinking_budget/thinking_level) to GenerationConfig
- Add id field to FunctionCall/FunctionResponse (Gemini 3+ requirement)
- Add Thought part variant for thinking token streaming
- Add responseSchema/responseMimeType for structured output

rusty_ollama:
- Add think: Option<bool> to chat request (reasoning models: deepseek-r1, qwen3)
- Add thinking field to response (streaming + non-streaming)
- Pass full JSON Schema as format field for structured output (Ollama 2025+)
- Emit ThinkingDelta events from NDJSON stream parser

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
…uctured output

rusty_claude:
- Update models: claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
- Add ExtendedThinking + StructuredOutput capabilities
- Fix stream_parser: handle ThinkingDelta and SignatureDelta events
- Fix convert: pass thinking config and output_config to API

rusty_gemini:
- Update models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
- Add ExtendedThinking, VideoInput, AudioInput capabilities
- Add thinking_config (budget/level) to generation config
- Add id field to FunctionCall/FunctionResponse (Gemini 3+ compat)
- Add Thought GeminiPart variant; emit ThinkingDelta from stream parser

rusty_ollama:
- Pass full JSON Schema as format field for structured output
- Add think flag propagation; emit ThinkingDelta from NDJSON stream

rusty_chatgpt:
- Add gpt-5.4, gpt-5.4-mini, gpt-5.4-nano models
- Add gpt54() and gpt54_mini() convenience methods

rusty_phi_silica:
- Add stream_tokens() to bridge trait (maps to GenerateResponseWithUpdatesAsync)
- Replace SyntheticStreamer with real chunk-based streaming in model

rusty_browser:
- Add BackingModel enum (GeminiNano vs PhiSilica)
- Add response_constraint to BrowserAiOptions (Chrome Prompt API)
- Add supports_response_constraint capability flag
- Update docs: window.ai deprecated, use LanguageModel global directly

rusty_ui_stream:
- Handle ThinkingDelta and SyntheticStreamingNotice events in encoder

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
rusty_browser: update BrowserAiBridge doc comments noting window.ai
deprecation in Chrome 138+, direct LanguageModel global usage, and
Edge/Phi Silica backing distinction

rusty_phi_silica: fix stream() to drive bridge.stream_tokens() directly
instead of calling generate() and wrapping in SyntheticStreamer

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
- All providers now use pub const for well-known model IDs
- Added Gemini 3 series: gemini-3.1-pro-preview, gemini-3-flash,
  gemini-3.1-flash-live-preview, gemini-embedding-2-preview
- Provider trait gains fetch_models() for dynamic API discovery
- GeminiProvider::list_remote_models() queries /v1beta/models
- ChatGptProvider::list_remote_models() queries /v1/models
- OllamaProvider already had list_models() via /api/tags

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
Core:
- Add SpeechToTextModel + TextToSpeechModel traits with
  TranscriptionResult, AudioResult, TtsOptions types
- Add ModelRegistry for caching dynamically fetched models
- Provider trait gains speech_to_text_model(), text_to_speech_model(),
  fetch_models() methods
- RouteCondition type alias fixes clippy::type_complexity
- StreamEvent::SyntheticStreamingNotice for local runtime awareness

Providers:
- ChatGPT: add WHISPER, TTS, TTS_HD, GPT_4O_REALTIME,
  GPT_4O_AUDIO, GPT_4O_MINI_REALTIME consts + AudioInput/AudioOutput
  capabilities for voice models
- All providers: rename consts to _LATEST suffix pattern with docs
  pointing users to fetch_models() for dynamic discovery

CI:
- Add .github/workflows/ci.yml with check, test, clippy (-Dwarnings),
  fmt, doc, and MSRV (1.75) jobs

Quality:
- Fix all clippy warnings across entire workspace
- Run cargo fmt --all
- GeminiRequestParts struct replaces complex return tuple

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
- MSRV 1.80 insufficient for transitive deps; bump to 1.85
- Fix unresolved rustdoc links in middleware.rs, provider.rs,
  and rusty_ui_stream lib.rs

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
Router::local_first was always returning true from its route condition,
making the cloud fallback unreachable. The closure now checks whether
the local model's CapabilitySet satisfies the request's needs (tool
calling, structured output) and falls through to cloud when it doesn't.

CacheMiddleware was keying on the prompt alone, so requests with the
same prompt but different temperature, tools, output schema, or other
generation options incorrectly returned the same cached result. The key
now hashes all generation-affecting fields (numeric options via bit
patterns, serializable types via JSON, enum variants via Debug).

Tests added for both: four router routing scenarios and four cache
hit/miss/TTL scenarios using MockLanguageModel.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stable rustfmt versions differ between local (1.93) and CI runners,
causing spurious fmt failures. Nightly rustfmt is the conventional
choice for CI formatting checks. Also clear RUSTFLAGS for the fmt job.

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
Root cause: no rustfmt.toml meant different rustfmt versions
(local 1.93 vs CI stable) produced different output. Adding
rustfmt.toml with edition="2021" ensures deterministic formatting
regardless of toolchain version.

https://claude.ai/code/session_01NMtKKc9beRbzKEznEEzQZq
Four distinct silent-failure patterns fixed across all streaming providers:

1. SSE/NDJSON parse failures now terminate the stream with StreamError
   instead of logging a warning and continuing as if no data was lost.
   Affected: Claude (parse_sse_event), Gemini, Ollama (build_ndjson_stream),
   OpenAI-compatible.

2. Malformed tool-call JSON (accumulated from streaming deltas) now
   emits a StreamError/Error event rather than silently substituting
   an empty arguments object {}. Affected: Claude (ContentBlockStop),
   OpenAI-compatible (flush_pending_tools and inline flush).

3. Transport error source chain was being dropped (source: None) in
   the Claude and Gemini byte-stream error paths. The original
   reqwest::Error is now preserved via source: Some(Box::new(e)).

4. The OpenAI-compatible stream parser had an unreachable .unwrap() on
   a HashMap re-query to extract a call_id that was already bound as
   `_id` in the pattern match. Fixed to use the bound variable directly,
   removing the underscore suppressor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two issues fixed for every provider's non-2xx error path:

1. .unwrap_or_default() on response.text() silently produced a blank
   error message when the body couldn't be read (e.g. connection reset
   mid-response). All providers now log a warning and include the
   read-failure reason in the returned ProviderError message.

2. GeminiProvider::list_remote_models and ChatGptProvider::list_remote_models
   were discarding the HTTP status code (status: None) after checking it,
   preventing the retry middleware from distinguishing 429 from 500.
   Both now capture and forward status: Some(status_code).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sion

RetryMiddleware: replace .expect() on last_error with .unwrap_or_else
returning a descriptive Transport error, so no panic occurs if the loop
invariant is somehow violated.

LoggingMiddleware: success and error paths now both respect the
configured tracing level (previously error path always used ERROR,
ignoring with_level() settings). Error path now uses ?e (Debug format)
to preserve the full error source chain; Display was silently dropping
the underlying reqwest/IO cause.

CacheMiddleware: cache_key() now returns Option<u64>. If the prompt
cannot be serialized (returning None), process() bypasses the cache
entirely rather than hashing an empty DefaultHasher state, which
previously caused every un-serializable prompt to collide on a single
constant hash bucket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
generate_text returned AiError::Serialization when the model responded
with no text (tool-calls-only response). This is a provider response
characteristic, not a serialization error. Now returns ProviderError
with a clear message including the provider_id. Doc comment updated to
describe the failure mode.

ThinkingConfig::Adaptive doc comment referenced 'claude-opus-4-6+' which
is not a valid Anthropic model identifier. Replaced with a correct
description: 'claude-3-7-sonnet and later'.

OpenAiCompatibleModel::new() now delegates to try_new() which returns
AiResult<Self>, mapping the reqwest build failure to AiError::Transport
with the source chain preserved. new() wraps it with a descriptive
.expect() that names the actual failure condition (TLS unavailable).
Callers that need error recovery can use try_new() directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@undivisible undivisible merged commit 4d44ac2 into m Apr 23, 2026
10 of 12 checks passed
@undivisible undivisible deleted the claude/rusty-ai-sdk-design-7bT48 branch April 23, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants